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Abstract. Starting with the neo-Bayesian revival of the 1950s, many 
statisticians argued that it was inappropriate to use Bayesian meth- 
ods, and in particular subjective Bayesian methods in governmental 
and public policy settings because of their reliance upon prior distri- 
butions. But the Bayesian framework often provides the primary way 
to respond to questions raised in these settings and the numbers and 
diversity of Bayesian applications have grown dramatically in recent 
years. Through a series of examples, both historical and recent, we ar- 
gue that Bayesian approaches with formal and informal assessments of 
priors AND likelihood functions are well accepted and should become 
the norm in public settings. Our examples include census-taking and 
small area estimation, US election night forecasting, studies reported to 
the US Food and Drug Administration, assessing global climate change, 
and measuring potential declines in disability among the elderly. 

Key words and phrases: Census adjustment, confidentiality, disability 
measurement, election night forecasting, Bayesian clinical drug studies, 
global warming, small area estimation. 



1. INTRODUCTION AND HISTORY 

Beginning with the posthumous publication in 1763 
of the essay attributed to the Rev. Thomas Bayes, 
and continuing well into the twentieth century, vir- 
tually the only approach to statistical inference was 
the method of inverse probability based on applica- 
tions of Bayes's theorem (see, e.g., Fienberg, 2006a). 
Nonetheless, most applications of statistical meth- 
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ods in governmental settings were based primarily 
on descriptive statistics and there was little debate 
regarding the relevance of Bayesian approaches in 
public life despite efforts at implementation, for ex- 
ample, Laplace's development of ratio estimation to 
estimate the size of the population of France. 

Criticism of the method of inverse probability, as 
Bayesian methodology was known for almost 
200 years, began in the mid-19th century with the 
rise of a philosophical school advocating objective 
probability. The fundamental concern of the objec- 
tivists was the requirement for a prior distribution 
and they argued for a frequentist view of probabil- 
ity. Unfortunately they failed to present a method- 
ology for inference to counter that of inverse proba- 
bility and it was not until the work of R. A. Fisher 
and Jerzy Neyman and Egon Pearson in the 1920s 
that serious alternative statistical procedures were 
in place. Neyman's (1934) critique of Gini's version 
of the representative method for survey taking not 
only ushered the frequentist repeated sampling per- 
spective into the realm of official statistics, but it 
also introduced the frequentist tool of confidence in- 
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tervals and its long-run repeated sampling interpre- 
tation (see Fienberg and Tanur, 1996). 

Bayesian tools played an important role in a num- 
ber of statistical efforts during World War II, includ- 
ing Alan Turing's work at Bletchley Park, England, 
to crack the Enigma code, but with the creation 
of such frequentist methods as sequential analysis 
by Barnard in England and Wald in the United 
States and the elaboration of design-based analy- 
ses in sample surveys, as statistics passed the mid- 
century mark, frequentist approaches were in the 
ascendancy in the public arena. This was especially 
true in statistical agencies where the ideas of ran- 
dom selection of samples and repeated sampling as 
the basis of inference were synonymous, and statis- 
tical models and likelihood-based methods frowned 
upon at best. 

With the introduction of computers for statistical 
calculations in the 1960s, however, Bayesian meth- 
ods began a slow but prolonged comeback that ac- 
celerated substantially with the introduction of Mar- 
kov chain Monte Carlo (MCMC) methods in the 
early 1990s. Today Bayesian methods are challeng- 
ing the supremacy of the frequentist approaches in 
a wide array of areas of application. 

How do the approaches differ? In frequentist infe- 
rence, tests of significance are performed by suppo- 
sing that a hypothesis is true (the null hypothesis) 
and then computing the probability of observing 
a statistic at least as extreme as the one actually ob- 
served during hypothetical future repeated trials con- 
ditional on the parameters, that is, a p-value. Baye- 
sian inference relies upon direct inferences about pa- 
rameters or predictions conditional on the observa- 
tions. In other words, frequentist statistics examines 
the probability of the data given a model (hypoth- 
esis) and looks at repeated sampling properties of 
a procedure, whereas Bayesian statistics examines 
the probability of a model given the observed data. 
Bayesian methodology relies largely upon Bayes's 
theorem for computing posterior probabilities and 
provides an internally consistent and coherent nor- 
mative methodology; frequentist methodology has 
no such consistent normative framework. Freedman 
(1995) gave an overview of these philosophical posi- 
tions, but largely from a frequentist perspective that 
is critical of the Bayesian normative approach. 

The remainder of the article has the following 
structure. In the next section I give a summary of 
some of the most common and cogent criticisms of 
the Bayesian method, especially with regard to its 



use in a public context. Then in Section 3, through 
a series of examples, both historical and recent, I ar- 
gue that Bayesian approaches with formal and infor- 
mal assessments of priors and likelihood functions 
are well accepted and should become the norm in 
public settings. My examples include US election 
night forecasting, census-taking and small area es- 
timation, studies reported to the US Food and Drug 
Administration, assessing global climate change, and 
measuring declines in disability among the elderly. 
We conclude with a brief summary of challenges fac- 
ing broader implementation of Bayesian methods in 
public contexts. 

I do not claim to be providing a comprehensive ac- 
count of Bayesian applications but have merely at- 
tempted to illustrate their breadth. One area where 
Bayesian ideas have made serious inroads, both in 
theory and in actual practice, but which we do not 
discuss here is the law (e.g., see Fienberg and Kada- 
ne, 1983; Donnelly, 2005; Taroni et al., 2006; Kadane, 
2008). The present article includes a purposeful se- 
lection of references to guide the reader to some of 
the relevant recent Bayesian literature on applica- 
tions in the domains mentioned, but the list is far 
from comprehensive and tends to emphasize work 
closest to my own. 

2. THE ARGUMENTS FOR AND AGAINST 
THE USE OF BAYESIAN METHODS 

Bayesian and frequentist inference in a nutshell: 
It is especially convenient for the present purposes 
to think about Bayes's theorem in terms of density 
functions. Let h(y\8) denote the conditional density 
of the random variable Y given a parameter value 6 
in the parameter space O. Then we can go from 
the prior distribution for 6, g(0), to that associated 
with 8 given Y = y, g(6\y), by 

(1) g (e\y) = h(y\e)g(e)/J2h(y\e)g(e) 

eee 

if 9 has a discrete distribution, 

(2) g{6\y)=h{y\e)g{6)/ [ h{y\6)g{6)de 

Je 

if 9 has a continuous distribution. 

Bayesians make inferences about the parameters by 
looking directly at the posterior distribution g(0\y) 
given the data y. Frequentists make inferences 
about indirectly by considering the repeated sam- 
pling properties of the distribution of the data y 
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given the parameter 9, that is, through h(y\9). Baye- 
sians integrate out quantities not of direct substan- 
tive interest and then are able to make probabilis- 
tic inferences from marginal distributions. Most fre- 
quentists use some form of conditioning argument 
for inference purposes while others maximize like- 
lihood functions. Prequentists distinguish between 
random variables and parameters which they take to 
be fixed and this leads to linear mixed models where 
some of the effects are fixed, that is, are parameters, 
and some are random variables. For a Bayesian all 
linear models are in essence random effects models 
since parameters are themselves considered as ran- 
dom variables. Thus it is natural for a Bayesian to 
consider them to be independent draws from a com- 
mon distribution, g(9), that is, treating them as ex- 
changeable following the original argument of de Fi- 
netti (1937). This approach leads naturally to put- 
ting distributions on the parameters of prior distri- 
butions and to what we now call the hierarchical 
Bayesian model. It is the normalizing constants [the 
denominators of (1) and (2)] that are notoriously 
difficult to compute and this fact has led, in large 
part, to the use of MCMC methods such as Gibbs 
sampling that involve sampling from the posterior 
distribution. 

A reviewer of an earlier version of this article sug- 
gested that hierarchical models are really not Baye- 
sian, unless one puts a prior at the top level of the hi- 
erarchy. This ignores history. As Good (1965) noted, 
his own use of such ideas draws on work dating back 
at least to the 1920s and the work of W. E. John- 
son whose "sufficientness" postulate implicitly used 
finite exchangeable sequences. And while non-Baye- 
sians came to recognize the power of such structures 
many decades later they did attempt to emulate the 
Bayesian approach, but of course without the clean 
Bayesian probabilistic interpretation. 

Critique of the Bayesian perspective: The most 
common criticism of Bayesian methods is that, since 
there is no single correct prior distribution, g(9), all 
conclusions drawn from the posterior distribution 
are suspect. One counter to this argument is that 
published analyses using Bayesian methods should 
consider and report the results associated with a va- 
riety of prior distributions, thus allowing the reader 
to see the effects of different prior beliefs on the 
posterior distribution of a parameter. Others argue 
that one should choose as a prior distribution one 
that in some sense eliminates personal subjectiv- 
ity. Examples of such "objective" priors are those 
that are uniform or diffuse across all possible values 



of the parameter, or those that are "information- 
less." Berger (2006) and Goldstein (2006) presented 
arguments in favor of the objective and subjective 
Bayesian approaches in a forum followed by exten- 
sive discussion. For a discussion of the fruitlessness 
of the search for an objective and informationless 
prior, see the article by Fienberg (2006b). 

There are a number of other features associated 
with the subjective approach including the elicita- 
tion of information for the formulation of prior dis- 
tributions and the use of exchangeability in the de- 
velopment of successive layers of hierarchical mod- 
els. A number of the examples described in the sec- 
tions that follow utilize subjective Bayesian features 
although not always with full elicitation. 

One characteristic of Bayesian inference that weak- 
ens this criticism of the reliance on the prior dis- 
tribution is that the more data we collect, the less 
influence the prior distribution has on the posterior 
distribution relative to that of the data. There are 
situations, however, where even an infinite amount 
of data may not bring two people into agreement 
(see, e.g., Diaconis and Freedman, 1986). 

Another aspect of the Bayesian methodology that 
arises in many applications is the manner in which 
it "borrows strength" when we are estimating many 
parameters simultaneously, especially through the 
use of hierarchical models. This feature, which is 
usually viewed as a virtue, has also been the focal 
point of criticism by frequentists. For example, see 
the commentary by Freedman and Navidi (1986) in 
the context of census adjustment, in which they cri- 
tiqued a Bayesian methodology at least in part be- 
cause it resulted in the use of data from one state to 
adjust the census-based population figures in other 
ones. Today, borrowing strength via cross-area re- 
gression models is common in frequentist circles, and 
the Freedman-Navidi argument thus takes on a non- 
statistical legal issue rather than a statistical one. 

For an interesting dialog on different frequentist 
perspectives related to statistical inference, see the 
discussion paper by a group of frequentist statisti- 
cians at Groningen University in The Netherlands, 
Kardaun et al. (2003), which was a response to a se- 
ries of questions posed by David Cox following a lec- 
ture at Groningen. As someone else has noted, it is 
a rare occasion where frequentists seriously enter- 
tain ideas such as those extolled by de Finetti (1937) 
and attempt to reject them. A number of the ques- 
tions discussed in this article arise in the context of 
the examples that follow. 



4 



S. E. FIENBERG 



3. SMALL AREA ESTIMATION AND CENSUS 
ADJUSTMENT 

Small area estimation: As we have already inti- 
mated, small area estimation has been a ripe area for 
Bayesian methods although because so much of the 
literature has been oriented toward national statis- 
tical agency problems, the area is dominated by fre- 
quentist techniques and assessments. Surveys con- 
ducted by national statistical agencies typically gen- 
erate "reliable" information either at national or re- 
gional levels. But the demand for information at 
lower levels of disaggregation is sufficiently great and 
resources tend to be relatively scarce, so that tech- 
niques that bolster the sparsity of data at the lower 
level of disaggregation with data from other sources 
or from other areas or domains are essential to get- 
ting estimates with relatively small standard errors. 

The big question is with respect to what distri- 
bution are the standard errors computed. There are 
three different answers depending on one's perspec- 
tive. Sampling statisticians most often wish to take 
expectations with respect to the random structure 
in the sampling design. At the other extreme are 
Bayesians for whom the variability is an inherent 
part of the stochastic model structure for the phe- 
nomenon of interest, for example, unemployment or 
crime. And in the middle are model-based likelihood 
statisticians. My argument is that in the context 
of small area estimation the design-based statisti- 
cians were singularly unsuccessful until they emu- 
lated Bayesian ideas of smoothing and borrowing 
strength, but even then they have insisted on av- 
eraging with respect to the sampling design, with 
arguments about robustness of results. 

Jiang and Lahiri (2006) suggested that the prob- 
lem goes back almost a millennium to the eleventh 
century, but interest in formal statistical estimation 
for small areas is a relatively recent phenomenon 
and much of the recent literature can be traced to 
a seminal article by Fay and Her riot (1979) who 
used the James-Stein "shrinkage" estimation ideas 
to carry out small area estimation in a frequen- 
tist manner. Given the close relationship between 
such techniques and empirical Bayesian estimation 
(e.g., see Efron and Morris, 1973) and mixed linear 
models, it is a relatively small leap to the use of 
fully Bayesian methodology. But the evolution to- 
ward such methodology documented by Jiang and 
Lahiri has been relatively slow and marked by a gen- 
eral resistance in statistical agencies to use models 



to begin with, let alone Bayesian formulations; for 
example, see the descriptions of small area estima- 
tion methodology in the book by Rao (2003), and 
contrast it with the Bayesian hierarchical formula- 
tions in the work of Ballin, Scanu and Vicard (2005) 
and Trevisani and Torelli (2004). 

Census adjustment: What is remarkable about the 
ascendency of the small area estimation methodol- 
ogy in the United States is that many of those who 
argued for its use opposed the use of essentially the 
same ideas for census adjustment for differential un- 
dercount in the 1980s and 1990s. The basic compo- 
nent of census adjustment in these debates was the 
use of the now standard capture-recapture method- 
ology for population estimation (e.g., see Bishop, 
Fienberg and Holland, 1975, Chapter 6), methodol- 
ogy that has its roots in Laplace's method of ratio 
estimation. Because a second count (the recapture) 
in a census context cannot reasonably be done for 
the nation as a whole, methods that utilize a sam- 
ple of individuals were introduced in 1950 and to get 
small area estimates of population, that is, for ev- 
ery block in the nation, Ericksen and Kadane (1985) 
proposed the use of a Bayesian regression model for 
smoothing. Being fully Bayesian was especially im- 
portant because of the sparseness of the data at their 
disposal for adjustment, based on a sample from 
the Current Population Survey. As we noted above, 
Freedman and Navidi (1986) opposed the use of this 
methodology as did Fay and Herriot's colleagues at 
the US Census Bureau, at least in part on its use of 
models with unverifiable assumptions, and precisely 
because the shrinkage approach embedded in the 
methodology borrowed strength across state bound- 
aries to get sufficiently tight estimates of error. 

Ericksen, Kadane and Tukey (1989) presented 
a more refined version of the technical arguments 
looking back to the 1980 census, as well as ahead to 
the 1990 census. For the 1990 census, the US Census 
Bureau essentially proposed the use of a frequen- 
tist approach that had similar structure, at least in 
spirit, to that proposed for 1980, and this was pos- 
sible only by increasing the size of the sample used 
for adjustment purposes by an order of magnitude. 
This plan was opposed largely on political grounds 
as well as by Freedman and colleagues who contin- 
ued to object to the role of statistical models in 
the estimation procedure. A similar controversy en- 
sued as planning for the 2000 census progressed with 
components for adjustment as well as sampling for 
nonresponse followup, and ultimately the Supreme 
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Court stepped in and interpreted the Census Act as 
banning the use of sampling for this purpose. Ander- 
son and Fienberg (1999) and Anderson et al. (2000) 
provided extensive details on the 1990 and 2000 ad- 
justment controversies. While American politicians 
have eschewed the use of Bayesian and non-Bayesian 
adjustment techniques, statistical agencies in sev- 
eral other countries, such as Argentina, Australia 
and the United Kingdom, have implemented similar 
methodology, although with little emphasis on its 
Bayesian motivation. 

4. ELECTION NIGHT FORECASTING 

In the United States the use of statistical forecast- 
ing of election outcomes based on early reported re- 
turns began in the early 1950s. The CBS television 
network employed one of the early computers, the 
UNIVAC, and the statistician Max Woodbury devel- 
oped a regression-style model that was used success- 
fully to predict the outcome of the 1952 presiden- 
tial election. By 1960, computers had become a ma- 
jor tool of the US television networks in support of 
their election night coverage. Everything was based 
in some form or another on the 150,000+ precincts 
where votes were cast across the US, and attention 
focused on subsets of "key" precincts, chosen in dif- 
ferent ways by the three major networks, and on 
early access to precinct results. The following de- 
scription draws upon that in the article by Fienberg 
(2007). 

In 1960, the RCA Corporation which owned the 
NBC television network, hired CEIR, a statistical 
consulting firm, to develop a rapid election night pro- 
jection procedure. CEIR consultants included Max 
Woodbury, and a number of others including John 
Tukey. Computers were still large, expensive and 
slow, and much of what Max Woodbury had done for 
CBS still had to be done by hand. Data of several 
types were available: past history (at various lev- 
els, e.g., county), results of polls preceding the elec- 
tion, political scientists' predictions, partial county 
returns flowing in during the evening, and complete 
results for selected precincts. The data of the anal- 
yses were, in many cases, swings from sets of base 
values derived from past results and from political 
scientists' opinions. It turned out that the impor- 
tant problem of projecting turnout was more dif- 
ficult than that of projecting candidate percentage. 
Starting with the 1962 congressional election, Tukey 
assembled a statistical team to develop the required 



methodology and to analyze the results as they flo- 
wed in on election night. Early members of the team 
included Bob Abelson, David Brillinger, Dick Link, 
John Mauchly and David Wallace who joined for 
the 1964 primaries. From 1962 through 1966, they 
were consultants to RCA and they interacted with 
the political scientists and one-time Census Bureau 
official Richard Scammon who had his own method- 
ology using a collection of key precinct results. 

David Brillinger (2002) recalled: "Tukey sought 
'improved' estimates. His terminology was that the 
problem was one of 'borrowing strength'." There is 
a remarkably close resemblance between this metho- 
dology and that used for small area estimation. The 
novel feature in the election night context comes 
from the nature of the sparsity — because estimation 
was based on early reported returns. The method- 
ology is now recognizable as hierarchical Bayesian 
with the use of empirical Bayesian techniques at the 
top level. Data flowed in with observations at the 
precinct (polling place) level and were aggregated 
to county level, and then to the state whole. 
Subjective judgment was used in the choice of the 
subsets of "key" precincts and prior distributions 
were typically based on the results of prior state 
elections with the choice being made subjectively to 
capture the political scientists' best judgment about 
which past election most closely resembled the elec- 
tion at hand. As early returns arrived at the com- 
puting central command facility, a team of statis- 
ticians reviewed the actual distribution of early re- 
turns across the state to check for anomalies in light 
of special circumstances and political practices. 

And estimates that really mattered were those at 
the state level since the model was used for statewide 
elections for governor and senate positions as well as 
for presidential elections where state outcomes play 
a crucial role. Two models were used: one for pro- 
jecting turnout and the other for projecting the ac- 
tual percentage difference ("swing") between Demo- 
cratic and Republican candidates. The occasional 
rise of serious independent candidates led to model 
extensions and empirical complications. 

Brillinger went on to note: "Jargon was developed; 
for example, there were 'barometric' and 'swing-o- 
metric' precinct samples. The procedures developed 
can be described as an early example of empiri- 
cal Bayes. The uncertainties, developed on a differ- 
ent basis, were just as important as the point esti- 
mates." The variance calculations appeared nowhere 
in the statistical literature and thus they had to 
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be derived and verified by members of the team. 
This was at about the same time as David Wal- 
lace was working with Frederick Mosteller on their 
landmark Bayesian study of The Federalist Papers, 
which was published in 1964. Tukey's attitude to 
release of the techniques developed is worth com- 
menting on. Brillinger recounted how, on various oc- 
casions, members of his "team" were asked to give 
talks and write papers describing the work. When 
Tukey's permission was sought, his remark was in- 
variably that it was "too soon" and that the tech- 
niques were "proprietary" to RCA and NBC. With 
Tukey's death in 2002, we may well have lost the 
opportunity to learn all of the technical details of 
the work done 40 years earlier. 

Tukey's students and his collaborators began to 
use related ideas on "borrowing strength," for exam- 
ple, in the National Halothane Study of anesthetics 
(Bunker et al., 1969) and for the analysis of contin- 
gency table data (e.g., see Bishop, Fienberg and Hol- 
land, 1975). All of this before the methodology was 
described in somewhat different form by I. J. Good 
in his 1965 book and christened as "hierarchical 
Bayes" in the classic 1972 paper by Dennis Lind- 
ley and Adrian Smith. The specific version of hier- 
archical Bayes in the election night model remained 
unpublished, although in an ironic twist, something 
close to it appeared in a paper written by one of 
David Wallace's former students, Alastair Scott, and 
a colleague, Fred Smith (1969, 1971), who were un- 
aware of any of the details of Wallace's work for 
NBC and who developed their approach for different 
purposes! Several other hierarchical Bayesian elec- 
tion night forecasting models have now been used in 
other countries, for example, see the work of Brown, 
Firth and Payne (1997) and Bernardo and Giron 
(1992). 

The methods described here were in use at NBC 
through the 1980 presidential elections. Other net- 
works used different methodology and the statisti- 
cians who worked for the Tukey team were quite 
proud of their record of early and more accurate calls 
of winners than those made by the other networks, 
especially in close elections. With Reagan's land- 
slide presidential victory in 1980, the results were 
seemingly better captured by exit polls and from 
1982 onward NBC switched to the use of exit polls 
in competition and then in collaboration with the 
other television networks. See the article by Fien- 
berg (2007) for further details and a number of the 
recent controversies regarding exit poll forecasting 
and reporting. 



5. BAYESIAN METHODOLOGY AND THE US 
FOOD AND DRUG ADMINISTRATION 

Traditional randomized clinical trials, evaluated 
with frequentist methodology, have long been viewed 
as the bedrock of the drug and device approval sys- 
tem at the US Food and Drug Administration (FDA) . 
Over the past couple of decades the drug companies 
and some members of the US Congress have been 
critical of the lengthy FDA review processes that 
have resulted and the enormous expense associated 
with bringing drugs and medical devices to market. 
The statistical literature has also produced Bayesian 
randomized design alternatives (e.g., see Spiegelhal- 
ter, Freedman and Parmar 1994; Berry, 1991, 1993, 
1997; Berry and Stangl, 1996; Simon, 1999), as well 
as ethical critiques of traditional frequentist trials 
(e.g., see Kadane, 1996). Aside from the actual in- 
terpretation of the outcomes in a Bayesian frame- 
work, these and other authors have argued that the 
Bayesian approach can provide faster and more use- 
ful clinical trial information in a wide variety of cir- 
cumstances in comparison with frequentist method- 
ology. 

Bayesian designs and analyses are part of an in- 
creasing number of premarket submissions to FDA's 
Center for Devices and Radiological Health (CDRH). 
This initiative, which began in the late 1990s, takes 
advantage of good prior information on safety and 
effectiveness that is often available for studies of the 
same or similar recent generation devices. In 2006, 
CDRH issued draft guidelines for the use of Bayesian 
statistics in clinical trials for medical devices (FDA, 
2006) and these were finalized in 2010 (FDA, 2010). 
Previous regulatory guidelines have mentioned Baye- 
sian methods briefly, but this was the first broadly 
circulated specific document focusing on Bayesian 
methodologies. The guidelines do, however, place 
considerable onus on the drug companies who wish 
to present Bayesian studies, largely because of justi- 
fiable concerns over selective use of data from within 
studies and the reporting of results. 

As the guidelines make clear, Bayesian formula- 
tions and methods can improve the assessment of 
new drugs and devices by incorporating expert opin- 
ion, results of prior investigations, both experiments 
and observational studies, and synthesizing results 
across concurrent studies. There are sections that 
emphasize the importance of hierarchical models and 
the different roles for exchangeability, for example, 
among patients within trials and among trials. We 
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quote from the final guidelines on the role of prior 
information: 

We recommend you identify as many sour- 
ces of good prior information as possible. 
The evaluation of "goodness" of the prior 
information is subjective. Because your 
trial will be conducted with the goal of 
FDA approval of a medical device, you 
should present and discuss your choice of 
prior information with FDA reviewers (cli- 
nical, engineering and statistical) before 
your study begins. 

Possible sources of prior information in- 
clude: 

• clinical trials conducted overseas, 

• patient registries, 

• clinical data on very similar products, 

• pilot studies. 

The guidelines go on: 

Prior distributions based directly on data 
from other studies are the easiest to eval- 
uate. While we recognize that two stud- 
ies are never exactly alike, we nonetheless 
recommend the studies used to construct 
the prior be similar to the current study 
in the following aspects: 

• protocol (endpoints, target population, 
etc.), and 

• time frame of the data collection (e.g., 
to ensure that the practice of medicine 
and the study populations are compa- 
rable) . 

In some circumstances, it may be helpful 
if the studies are also similar in investiga- 
tors and sites. Include studies that are fa- 
vorable and nonfavorable. Including only 
favorable studies creates bias. Bias, based 
on study selection may be evaluated by: 

• the representativeness of the studies that 
are included, and 

• the reasons for including or excluding 
each study. 

Prior distributions based on expert opin- 
ion rather than data can be problematic. 
Approval of a device could be delayed or 
jeopardized if FDA advisory panel mem- 
bers or other clinical evaluators do not 



agree with the opinions used to generate 
the prior (pages 22-23). 

The FDA guidelines include examples of Bayesian 
studies that have met agency review standards. Two 
examples are: 

Example 1 (T-Scan). 2 T-scan 2000 is a de- 
vice to be used as an adjunct to mammography for 
patients with equivocal results. The FDA was pre- 
sented with an "intended-use" study of 74 consec- 
utive biopsies in Italy. The company combined the 
results with those from a prospective double blind 
study at seven centers compared T-scan to T-scan 
plus mammography for 504 patients, and the results 
from a "targeted" study of 657 biopsy cases at two 
centers in Israel using a Bayesian multinomial logis- 
tic model. It was able to demonstrate effectiveness 
in intended use context where there was insufficient 
information to demonstrate effectiveness. The prior 
was chosen to smooth the zero counts but to be rel- 
atively diffuse. The device was approved for this use 
as a consequence in 1999. 

Example 2 (Inter Fix) . 3 Inter Fix is an implant 
device for spinal fusion procedure for patients with 
degenerative disc disease and back pain. There were 
data available for 139 patients in randomized clin- 
ical trial, with 77 treated and 62 controls. There 
were also 104 nonrandomized subjects treated. An 
interim analysis was performed based on a Bayesian 
predictive model for the future success rate of the 
device, although most of the other analyses reported 
appear to be frequentist in nature. The device was 
approved in 1999 as well. 

CDRH statisticians have been exploring and lec- 
turing on important lessons learned in the course of 
the Bayesian initiative for the design, conduct and 
analysis of medical devices studies such as the two 
outlined here. 

Although the two studies described above made 
use of the pooling of evidence, in many ways the key 
benefit of Bayesian methods is the ability it offers to 
change the study's course when the welfare of sub- 
jects is at stake — using what is known as adaptive 
randomization. As Don Berry has argued: 



2 http: / /www. accessdata.fda.gov /scripts /cdrh /cfdocs / cfTopic / 
pma/pma.cfm?num=p970033. 

3 http: / /www. accessdata.fda.gov /scripts /cdrh /cfdocs / cfTopic / 
pma/pma.cfm?num=p970015. 
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In a multiyear frequentist study, new pa- 
tients will have the same chance of be- 
ing enrolled in either group, regardless of 
whether the new or old drug is perform- 
ing better. This approach can put patients 
at a disadvantage. A Bayesian model, on 
the other hand, can periodically show re- 
searchers that one arm is outperforming 
the other and then put more new volun- 
teers into the better arm. (Don Berry quo- 
ted in Beckman, 2006) 

As is the case in other applications, at the FDA 
the main criticism of the Bayesian approach is the 
difficulty associated with the choice of the prior. 
Spiegelhalter, Freedman and Parmar (1994) stressed 
the use of different forms of priors such as reference 
priors, "clinical" priors, "skeptical" priors, and en- 
thusiastic priors. The FDA guidelines clearly argue 
against "subjective" expert opinion, but as we know 
from other settings the likelihood function is often 
at least as subjective as is the prior and hierarchical 
Bayesian structures impose substantial constraints 
on the prior and thus the posterior even when one 
uses "diffuse" distributions on the parameters at the 
highest levels of the hierarchy! Moreover, when one 
is drawing upon previous studies, there is always an 
issue of how much "weight" these should receive in 
the prior, especially if the previous studies did not 
involve randomization as in Example 2. 

Unfortunately, as these ideas move to other parts 
of the FDA they are not without controversy. While 
we were completing this article, a new controversy 
over a specific drug made news. Vasogen Inc. an- 
nounced that on Friday, March 14, 2008 it had an 
initial teleconference with the FDA to discuss and 
clarify the recent FDA comments regarding the use 
of a Bayesian approach for ACCLAIM II, a clinical 
trial which is being planned to support an applica- 
tion for US market approval of the Celacade™ Sys- 
tem for the treatment of patients with New York 
Heart Association Class II heart failure. 4 Oversight 
of the drug approval had shifted from CDRH — which 
had issued the guidelines for use of Bayesian me- 
thods — to the FDA Center for Biologies Evaluation 
and Research (CBER), which has adopted a far more 
cautious approach. How such issues will work them- 
selves out remains to be seen. 



4 FDA deals blow to Vasogen's heart treatment, Reuters, 
March 3, 2008. 



Another place at the FDA where Bayesian method- 
ology has recently come into vogue is in the post- 
approval surveillance of drugs and devices, especially 
with regard to side effects. DuMouchel (1999) dis- 
cussed hierarchical Bayesian models for analyzing 
a very large frequency table that cross-classifies ad- 
verse events by type of drug used. Madigan et al. 
(2010) described a more elaborate, large-scale ap- 
proach to the analysis of adverse event data gath- 
ered via spontaneous reporting systems linked to 
claims databases. 

It is worth noting that Bayesian methods have been 
used in innovative ways to study the combination of 
evidence across studies on matters directly before the 
FDA. On the advice of an expert panel, the FDA in 
2004 put a "black-box" warning — its highest warning 
level — on antidepressants for pediatric use especially 
among teenagers. The panel's advice was based not 
on actual suicides, but on indications that suicidal 
thoughts and behaviors increased in some children 
and teens taking newer selective serotonin reuptake 
inhibitor (SSRI)-type antidepressants. Kaizar et al. 
(2006) later addressed the combination of evidence 
using a hierarchical Bayesian meta-analytical ap- 
proach. They concluded that the evidence support- 
ing a causal link between SSRI-type antidepressant 
use and suicidality in children is weak. This will 
clearly be evidence that the FDA will need to con- 
sider when it next reviews this issue, as it surely will, 
because of subsequent observational studies that sug- 
gest teen suicides have increased considerably de- 
spite a substantial decrease in the use of antidepres- 
sants (e.g., see Gibbons et al., 2007). 

Finally we note the extensive applications of a ran- 
ge of Bayesian methods in the related matters of 
health technology assessment as described by Spie- 
gelhalter et al. (2000) and Spiegelhalter (2004). 

6. CONFIDENTIALITY AND THE 
RISK-UTILITY TRADE-OFF 

Protecting the confidentiality of data provided by 
individuals and establishments has been and contin- 
ues to be a major preoccupation of statistical agen- 
cies around the world. Over the past 30 years, statis- 
ticians within and outside a number of major agen- 
cies have worked to cast the confidentiality problem 
as a statistical one, and over the past decade this ef- 
fort has taken on substantial Bayesian overtones as 
the focus has shifted to the trade-off between risk as- 
sociated with protection of confidentiality and the 
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utility of databases for different kinds of statisti- 
cal analyses. See the articles in the book by Doyle, 
Theeuwes and Zayatz (2001) for a broad review of 
the literature as it stood about a decade ago. 

Some of the earlier confidentiality literature fo- 
cused on the protection of data against intruders 
or "data snoopers" and Fienberg, Makov and Sanil 
(1997) proposed modeling intruder behavior (and 
thus protection against it) using a subjective Baye- 
sian "matching" model; cf. the discussion of Bayesian 
"matching" methods in the book by D'Orazio, Di 
Zio and Scanu (2006). In 2001, Duncan et al. sug- 
gested a Bayesian approach to the risk-utility trade- 
off problem, which was later generalized in the con- 
text of a formal statistical decision theory model 
by Trottini and Fienberg (2002) and implemented 
in illustrative form by Dobra, Fienberg and TVot- 
tini (2003) in the context of protecting categorical 
databases. 

More recently, Ting, Fienberg and Trottini (2008) 
contrasted their method of random orthogonal ma- 
trix masking with other microdata perturbation me- 
thods, such as additive noise, from the Bayesian per- 
spective of the trade-off between disclosure risk and 
data utility. This work has yet to be adopted by 
statistical agencies, but related Bayesian modeling 
in the same spirit by Franconi and Stander (2002), 
Polettini and Stander (2004), Rinott and Shlomo 
(2007) and Forster and Webb (2007) has been done 
in close collaboration with those in agencies in Is- 
rael, Italy and the United Kingdom. 

One other Bayesian approach to confidentiality 
protection which has already seen successful pen- 
etration into US statistical agencies is based on the 
method of multiple imputation approach due orig- 
inally to Donald Rubin and proposed by him for 
application in the context of protecting confiden- 
tiality in 1993. See the article by Fienberg, Makov 
and Steele (1998) for a related proposal. The ba- 
sic idea is simple although the details of the imple- 
mentation can be complex. We want to replace the 
actual confidential data by simulated data drawn 
from the posterior distribution of a model that cap- 
tures the relationships among the variables to be 
released. Since these "sampled units" are synthetic 
and do not actually correspond to original sample 
members, proponents claim that the resulting data 
protect confidentiality by definition — others point 
out that synthetic people may be close enough to 
"real" sample members for there still to be problems 
of possible re-identification. The method of multi- 
ple imputation allows one to generate multiple syn- 



thetic (imputed) samples from the posterior and to 
use these samples to produce estimates of variabil- 
ity that have a frequentist interpretation. Raghu- 
nathan, Reiter and Rubin (2003) and authors of 
a number of subsequent articles described the for- 
malisms of the methodology as well as extensions 
involving only partially imputed data. Because sta- 
tistical agencies in the US were already experiment- 
ing with multiple imputation to deal with missing 
value problems, a number of them have recently ex- 
perimented with this technology for confidentiality 
protection as well. Since the methodology works for 
fairly general classes of prior distributions it could 
utilize, at least in principle, prior information from 
multiple sources as well as expert judgment. 

7. CLIMATE CHANGE AND ITS ABATEMENT 

By now there is hardly a literate person who has 
not heard about global warming and the dire con- 
sequences predicted if we do not change our behav- 
ior regarding the emission of greenhouse gases and 
aerosols. The following statements are typical and 
come from a report to the US Senate by Thomas 
Karl (2001), a senior official in the National Oceanic 
and Atmospheric Administration: 

• The natural "greenhouse" effect is real, and is an 
essential component of the planet's climate pro- 
cess. 

• Some greenhouse gases are increasing in the atmo- 
sphere because of human activities and increas- 
ingly trapping more heat. 

• The increase in heat-trapping greenhouse gases 
due to human activities are projected to be ampli- 
fied by feedback effects, such as changes in water 
vapor, snow cover, and sea ice. 

• Particles (or aerosols) in the atmosphere resulting 
from human activities can also affect climate. 

• There is a growing set of observations that yields 
a collective picture of a warming world over the 
past century. 

• It is likely that the frequency of heavy and ex- 
treme precipitation events has increased as global 
temperatures have risen. 

• Scenarios of future human activities indicate conti- 
nued changes in atmospheric composition through- 
out the 21st century. 

These and similar conclusions have been shared with 
the public by the Intergovernmental Panel on Cli- 
mate Change (IPCC) and the US National Academy 
of Sciences-National Research Council through a se- 
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ries of committee reports. Many of the statements 
are backed up by elaborate statistical assessments 
and modeling and over the past decade this work has 
taken on an increasingly Bayesian flavor. There have 
also been challenges to many of these statements, 
despite what the "global warming" proponents de- 
scribe as increasingly strong empirical support. See, 
for example, the report by Wegman, Scott and Said 
(2006) for a statistical critique of some recent mod- 
eling efforts. 

In Figure 1 we reproduce an example of the tem- 
perature reconstruction for the past 2000 years based 
on multiple sources prepared by a panel from the 
National Research Council (2006); see also National 
Academy of Sciences (2008). One thing that is ob- 
vious from this figure is the convergence of the data 
sources for the past 150 years, from the start of the 
industrial revolution, showing temperatures increas- 
ing substantially throughout recent times — this is 
global warming! What is also clear is the uncer- 
tainty associated with these reconstructions going 
back further in time — this is indicated by the shad- 
ing in the background of the figure, with darkness 
associated with greater uncertainty; cf. the article 
by Chu (2005). 

The precise trajectory of the recent increases in 
temperature clearly has substantial uncertainty 
across the data sources and models and it would sur- 
prise few of us to learn that projections from these 
data can vary dramatically. This has recently been 
the focus of intensive Bayesian analysis by a number 
of authors around the world; see, for example, the 
articles by Min and Hense (2006, 2007), and espe- 
cially work in the United States by Berliner, Levine 
and Shea (2000), Tebaldi et al. (2005) and Sanso, 
Forest and Zantedeschi (2008). 

Tebaldi, Smith and Sanso (2010) described a way 
to combine an ensemble of computer simulation mo- 
del results and projections and actual observations 
via hierarchical modeling in order to derive poste- 
rior probabilities of temperature and precipitation 
change at regional scale. They considered the ensem- 
ble of computer models as being drawn from a su- 
perpopulation of such models, and used hierarchical 
Bayesian models to combine results and compute 
the posterior predictive distribution for a new cli- 
mate model's projections along with the uncertainty 
to be associated with them. For a related discussion 
about assessing the uncertainties of projections, see 
the article by Chandler, Rougier and Collins (2010). 



Whether in the context of this work, or in many 
other efforts to forecast future temperatures, Baye- 
sian and non-Bayesian, almost all modeling efforts 
agree that temperatures will continue to rise. Where 
the principal disagreements come in is "by how much' 
and "what would be the impact by various strategies 
for abatement." 

It is worth noting that subjective Bayesian meth- 
ods were proposed for use in climate modeling as 
early as 1997 by Hobbs and the prominence of Baye- 
sian arguments is due not only to statisticians work- 
ing in this area but also to climate modeling special- 
ists such as Schneider (2002), who has noted: 

For three decades, I have been debating 
alternative solutions for sustainable devel- 
opment with thousands of fellow scientists 
and policy analysts — exchanges carried out 
in myriad articles and formal meetings. 
Despite all that, I readily confess a lin- 
gering frustration: uncertainties so infuse 
the issue of climate change that it is still 
impossible to rule out either mild or catas- 
trophic outcomes, let alone provide confi- 
dent probabilities for all the claims and 
counterclaims made about environmental 
problems. 

Even the most credible international as- 
sessment body, the Intergovernmental Pa- 
nel on Climate Change (IPCC), has refu- 
sed to attempt subjective probabilistic esti- 
mates of future temperatures. This has for- 
ced politicians to make their own guesses 
about the likelihood of various degrees of 
global warming. Will temperatures in 2100 
increase by 1.4 degrees Celsius or by 5.8? 
The difference means relatively adaptable 
changes or very damaging ones. . . 
So what then is "the real state of the 
world"? Clearly, it isn't knowable in tradi- 
tional statistical terms, even though sub- 
jective estimates can be responsibly of- 
fered. The ranges presented by the IPCC 
in its peer-reviewed reports give the best 
snapshot of the real state of climate chan- 
ge: we could be lucky and see a mild effect 
or unlucky and get the catastrophic out- 
comes. 

The IPCC assessment builds on formal and infor- 
mal use of subjective assessments of the evidence. 
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Fig. 1. Smoothed reconstructions of large-scale (Northern Hemisphere mean or global mean) surface temperature variations 
from six different research teams are shown along with the instrumental record of global mean surface temperature. Source: 
Figure S-l, National Research Council (2006), page 2. Reproduced with permission. 



There is in fact now a tradition in this field of ex- 
pert elicitation of expert judgments; for example, 
see the articles by Morgan and Keith (1995), Keith 
(1996) and Zickfeld et al. (2007). 

8. DISABILITY AMONG THE ELDERLY 

In the United States, there are no official gov- 
ernment surveys of disability and how it is chang- 
ing over time, but the National Institute on Ag- 
ing (NIA) has funded, with support of other gov- 
ernment agencies, two major longitudinal surveys 
that capture information on disability and link it 
to other data — the Health and Retirement Survey 
(HRS) and the National Long Term Care Survey 
(NLTCS). The original cohort for the NLTCS was 
surveyed in 1982 and there have been subsequent wa- 
ves in 1984, 1989, 1994, 1999 and 2004. The NLTCS 
has been managed by a university-based organiza- 
tion since the late 1980s, but actual data collection 
has been carried out by the US Census Bureau. Con- 
siderable interest in the NLTCS has focused on a se- 
ries of measures of disability know as "Activities of 
Daily Living" (ADLs) and "Instrumental Activities 
of Daily Living" (IADLs), especially for those in the 
sample exhibiting some dimension of disability on 
a screener question. Erosheva, Fienberg and Joutard 
(2007) studied a cross-sectional version of 16 binary 



ADLs and IADLs, represented in the form of a 2 16 
contingency table using a Bayesian latent variable 
model that was developed to be an analogue to the 
frequentist Grade of Membership (GoM) model of 
Manton, Woodbury and Tolley (1994), the likeli- 
hood function for which is notoriously problematic. 

The Bayesian version of the GoM model utilizes 
hierarchical modeling ideas through a layered latent 
variable structure. Let x = (xi,X2, ■ ■ ■ , xj) be a vec- 
tor of binary manifest variables. The GoM model is 
structured around K mixture components (extreme 
profiles), and it assigns to each individual a latent 
partial membership vector of K nonnegative ran- 
dom variables, g = (gi, g2, ■ ■ ■ , 9k), whose compo- 
nents sum to 1. By assigning a distribution D{g) to 
the vector g and integrating, we obtain the marginal 
distribution for individual response patterns in the 
form of individual-level mixtures. Erosheva, Fien- 
berg and Joutard explained how to fit this Bayesian 
GoM model using MCMC techniques and apply it 
to the data in the 2 16 contingency table displaying 
outcomes on the 16 ADLs and IADLs, treating these 
different measures of disability as exchangeable, and 
thus as if they were independent and drawn from 
another common distribution. Airoldi et al. (2007, 
2010) explored related aspects of model specification 
and model choice. As with a number of the earlier 
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examples, the hierarchical latent structure embed- 
ded in this modeling approach is a mechanism for 
gaining control over what might otherwise be an un- 
manageable number of parameters and essential to 
the success of the related methods. 

This work on disability opens the door to a num- 
ber of challenging problems for the Bayesian mod- 
eling community. For example: 

• How should a Bayesian working with hierarchical 
models such as the Bayesian GoM model incorpo- 
rate the survey weights that arise from the sam- 
pling scheme of the survey and adjustments for 
nonresponse? There is now an extensive literature 
that provides conflicting advice on the use of sur- 
vey weights in the Bayesian framework, but the 
hierarchical model complexities bring these issues 
into somewhat sharper focus in this setting; for 
example, see the contrasting arguments of Fien- 
berg (2009) and Little (2009). 

• Manrique-Vallier and Fienberg (2010) extended 
these ideas to longitudinal latent profiles applied 
to the six ADLs measured across all six waves of 
the survey, and Manrique-Vallier (2010) added in 
survival and generational effects to address the 
question of whether disability is increasing or de- 
creasing over time. He appeared to be able to 
capture characteristics that others have addressed 
using comparisons across cross-sections for each 
wave of the survey (see, e.g., Manton and Gu, 
2001; Manton, Gu and Lamb, 2006). Scaling these 
methods up to the full array of ADLs and IADLs 
with key covariates remains a major challenge. 
This is a matter of considerable interest to policy 
planners who are interested in forecasting future 
demands on the health-care infrastructure as a re- 
sult of changes in long-term disability over time. 

The Bayesian GoM model is a special case of 
a much larger class of mixed membership models 
that can be used to analyze a diverse array of data 
types ranging from text in documents to images, to 
linkages in networks, and longitudinal versions may 
prove applicable in other settings beyond the study 
of disability. 

9. CONCLUSION 

For much of the twentieth century, approaches to 
the design and analysis of statistical studies in gov- 
ernment settings and public policy were almost ex- 



clusively descriptive or dominated by the frequentist 
approach that followed from the work of Fisher and 
from Neyman and Pearson. With the neo-Bayesian 
revival of the 1950s, Bayesian methods and tech- 
niques slowly began to appear in the public arena, 
and their use has accelerated dramatically during 
the past two decades, especially with the rise of 
MCMC methods that have allowed for the sampling 
from posterior distributions in settings involving very 
large datasets. 

In this article, we have attempted to give some 
examples, both old and new, of Bayesian methods 
in statistical practice in government and public pol- 
icy settings and to suggest why in most of the cases 
there was ultimately little or no resistance to the 
Bayesian approach. Our examples have included cen- 
sus-taking and small area estimation, US election 
night forecasting, studies reported to the US Food 
and Drug Administration, assessing global climate 
change and measuring declines in disability among 
the elderly. Their diversity suggests that there is 
growing recognition of the value of Bayesian results, 
and a realization that the approach deals directly 
with questions of substantive interest. 

Where there has been controversy, it has largely fo- 
cused on the role of the choice of prior distributions 
and the appropriateness of "borrowing strength" 
across geographic boundaries. Arguments in favor of 
the use of "objective" priors have done little to stem 
the frequentist criticism of Bayesian methods, and 
typically ignore the highly subjective aspects of ele- 
ments on hierarchical structures and likelihood func- 
tions. Through the examples discussed here, we have 
tried to convey the fact that a pragmatic Bayesian 
approach inevitably includes many subjective ele- 
ments, although prior distributions may well draw 
on data from related settings and have an empirical 
flavor to them. Nonetheless, the principal challenge 
to Bayesian methods that remains is the need to 
constantly rebut the notion that frequentist meth- 
ods are "objective" and thus more appropriate for 
use in the public domain. 

In other areas of statistical application Bayesian 
methodology has also seen a major resurgence and 
this is especially true in connection with machine 
learning approaches to very large datasets, where 
the use of hierarchically structured latent variable 
models is essential to generating high-quality esti- 
mates and predictions. 
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